<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  
  <meta name="author" content="PG-Strom Development Team">
  <link rel="shortcut icon" href="../img/favicon.ico">
  <title>Basic Operations - PG-Strom Manual</title>
  <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>

  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
  <link href="//fonts.googleapis.com/earlyaccess/notosansjp.css" rel="stylesheet">
  <link href="//fonts.googleapis.com/css?family=Open+Sans:600,800" rel="stylesheet">
  <link href="../custom.css" rel="stylesheet">
  
  <script>
    // Current page data
    var mkdocs_page_name = "Basic Operations";
    var mkdocs_page_input_path = "operations.md";
    var mkdocs_page_url = null;
  </script>
  
  <script src="../js/jquery-2.1.1.min.js" defer></script>
  <script src="../js/modernizr-2.8.3.min.js" defer></script>
  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
  <script>hljs.initHighlightingOnLoad();</script> 
  
</head>

<body class="wy-body-for-nav" role="document">

  <div class="wy-grid-for-nav">

    
    <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
      <div class="wy-side-nav-search">
        <a href=".." class="icon icon-home"> PG-Strom Manual</a>
        <div role="search">
  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" />
  </form>

  [<a href="../ja" style="color: #cccccc">Japanese</a> | <strong>English</strong>]

</div>
      </div>

      <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
	<ul class="current">
	  
          
            <li class="toctree-l1">
		
    <a class="" href="..">Home</a>
	    </li>
          
            <li class="toctree-l1">
		
    <a class="" href="../install/">Install</a>
	    </li>
          
            <li class="toctree-l1">
		
    <span class="caption-text">Tutorial</span>
    <ul class="subnav">
                <li class=" current">
                    
    <a class="current" href="./">Basic Operations</a>
    <ul class="subnav">
            
    <li class="toctree-l3"><a href="#confirmation-of-gpu-off-loading">Confirmation of GPU off-loading</a></li>
    

    <li class="toctree-l3"><a href="#cpugpu-hybrid-parallel">CPU+GPU Hybrid Parallel</a></li>
    

    <li class="toctree-l3"><a href="#pullup-underlying-plans">Pullup underlying plans</a></li>
    

    </ul>
                </li>
                <li class="">
                    
    <a class="" href="../sys_admin/">System Administration</a>
                </li>
                <li class="">
                    
    <a class="" href="../brin/">Index Support</a>
                </li>
                <li class="">
                    
    <a class="" href="../partition/">Partitioning</a>
                </li>
                <li class="">
                    
    <a class="" href="../troubles/">Trouble Shooting</a>
                </li>
    </ul>
	    </li>
          
            <li class="toctree-l1">
		
    <span class="caption-text">Advanced Features</span>
    <ul class="subnav">
                <li class="">
                    
    <a class="" href="../ssd2gpu/">SSD2GPU Direct SQL</a>
                </li>
                <li class="">
                    
    <a class="" href="../arrow_fdw/">Arrow_fdw</a>
                </li>
                <li class="">
                    
    <a class="" href="../gstore_fdw/">Gstore_fdw</a>
                </li>
                <li class="">
                    
    <a class="" href="../plcuda/">PL/CUDA</a>
                </li>
    </ul>
	    </li>
          
            <li class="toctree-l1">
		
    <span class="caption-text">References</span>
    <ul class="subnav">
                <li class="">
                    
    <a class="" href="../ref_types/">Data Types</a>
                </li>
                <li class="">
                    
    <a class="" href="../ref_devfuncs/">Functions and Operators</a>
                </li>
                <li class="">
                    
    <a class="" href="../ref_sqlfuncs/">SQL Objects</a>
                </li>
                <li class="">
                    
    <a class="" href="../ref_params/">GUC Parameters</a>
                </li>
    </ul>
	    </li>
          
            <li class="toctree-l1">
		
    <a class="" href="../release_note/">Release Note</a>
	    </li>
          
        </ul>
      </div>
      &nbsp;
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">

      
      <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
        <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
        <a href="..">PG-Strom Manual</a>
      </nav>

      
      <div class="wy-nav-content">
        <div class="rst-content">
          <div role="navigation" aria-label="breadcrumbs navigation">
  <ul class="wy-breadcrumbs">
    <li><a href="..">Docs</a> &raquo;</li>
    
      
        
          <li>Tutorial &raquo;</li>
        
      
    
    <li>Basic Operations</li>
    <li class="wy-breadcrumbs-aside">
      
    </li>
  </ul>
  <hr/>
</div>
          <div role="main">
            <div class="section">
              
                <h1>Basic operations</h1>

<h1 id="confirmation-of-gpu-off-loading">Confirmation of GPU off-loading</h1>
<p>You can use <code>EXPLAIN</code> command to check whether query is executed on GPU device or not.
A query is internally split into multiple elements and executed, and PG-Strom is capable to run SCAN, JOIN and GROUP BY in parallel on GPU device. If you can find out GpuScan, GpuJoin or GpuPreAgg was displayed instead of the standard operations by PostgreSQL, it means the query is partially executed on GPU device.</p>
<p>Below is an example of <code>EXPLAIN</code> command output.</p>
<pre><code>postgres=# EXPLAIN SELECT cat,count(*),avg(ax)
                     FROM t0 NATURAL JOIN t1 NATURAL JOIN t2
                    GROUP BY cat;
                                  QUERY PLAN
--------------------------------------------------------------------------------
 GroupAggregate  (cost=989186.82..989190.94 rows=27 width=20)
   Group Key: t0.cat
   -&gt;  Sort  (cost=989186.82..989187.29 rows=189 width=44)
         Sort Key: t0.cat
         -&gt;  Custom Scan (GpuPreAgg)  (cost=989175.89..989179.67 rows=189 width=44)
               Reduction: Local
               GPU Projection: cat, pgstrom.nrows(), pgstrom.nrows((ax IS NOT NULL)), pgstrom.psum(ax)
               Combined GpuJoin: enabled
               -&gt;  Custom Scan (GpuJoin) on t0  (cost=14744.40..875804.46 rows=99996736 width=12)
                     GPU Projection: t0.cat, t1.ax
                     Outer Scan: t0  (cost=0.00..1833360.36 rows=99996736 width=12)
                     Depth 1: GpuHashJoin  (nrows 99996736...99996736)
                              HashKeys: t0.aid
                              JoinQuals: (t0.aid = t1.aid)
                              KDS-Hash (size: 10.39MB)
                     Depth 2: GpuHashJoin  (nrows 99996736...99996736)
                              HashKeys: t0.bid
                              JoinQuals: (t0.bid = t2.bid)
                              KDS-Hash (size: 10.78MB)
                     -&gt;  Seq Scan on t1  (cost=0.00..1972.85 rows=103785 width=12)
                     -&gt;  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
(21 rows)
</code></pre>

<p>You can notice some unusual query execution plans.
GpuJoin and GpuPreAgg are implemented on the CustomScan mechanism. In this example, GpuJoin runs JOIN operation on <code>t0</code>, <code>t1</code> and <code>t1</code>, then GpuPreAgg which receives the result of GpuJoin runs GROUP BY operation by the <code>cat</code> column on GPU device.</p>
<p>PG-Strom interacts with the query optimizer during PostgreSQL is building a query execution plan, and it offers alternative query execution plan with estimated cost for PostgreSQL's optimizer, if any of SCAN, JOIN, or GROUP BY are executable on GPU device.
This estimated cost is better than other query execution plans that run on CPU, it chooses the alternative execution plan that shall run on GPU device.</p>
<p>For GPU execution, it requires operators, functions and data types in use must be supported by PG-Strom.
It supports numeric types like <code>int</code> or <code>float</code>, date and time types like <code>date</code> or <code>timestamp</code>, variable length string like <code>text</code> and so on. It also supports arithmetic operations, comparison operators and many built-in operators.
See <a href="../ref_devfuncs/">References</a> for the detailed list.</p>
<h1 id="cpugpu-hybrid-parallel">CPU+GPU Hybrid Parallel</h1>
<p>PG-Strom also supports PostgreSQL's CPU parallel execution.</p>
<p>In the CPU parallel execution mode, Gather node launches several background worker processes, then it gathers the result of "partial" execution by individual background workers.
CustomScan execution plan provided by PG-Strom, like GpuJoin or GpuPreAgg, support execution at the background workers. They process their partial task using GPU individually. A CPU core usually needs much more time to set up buffer to supply data for GPU than execution of SQL workloads on GPU, so hybrid usage of CPU and GPU parallel can expect higher performance.
On the other hands, each process creates CUDA context that is required to communicate GPU and consumes a certain amount of GPU resources, so higher parallelism on CPU-side is not always better.</p>
<p>Look at the query execution plan below.
Execution plan tree under the Gather is executable on background worker process. It scans <code>t0</code> table which has 100million rows using four background worker processes and the coordinator process, in other words, 20million rows are handled per process by GpuJoin and GpuPreAgg, then its results are merged at Gather node.</p>
<pre><code># EXPLAIN SELECT cat,count(*),avg(ax)
            FROM t0 NATURAL JOIN t1
           GROUP by cat;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 GroupAggregate  (cost=955705.47..955720.93 rows=27 width=20)
   Group Key: t0.cat
   -&gt;  Sort  (cost=955705.47..955707.36 rows=756 width=44)
         Sort Key: t0.cat
         -&gt;  Gather  (cost=955589.95..955669.33 rows=756 width=44)
               Workers Planned: 4
               -&gt;  Parallel Custom Scan (GpuPreAgg)  (cost=954589.95..954593.73 rows=189 width=44)
                     Reduction: Local
                     GPU Projection: cat, pgstrom.nrows(), pgstrom.nrows((ax IS NOT NULL)), pgstrom.psum(ax)
                     Combined GpuJoin: enabled
                     -&gt;  Parallel Custom Scan (GpuJoin) on t0  (cost=27682.82..841218.52 rows=99996736 width=12)
                           GPU Projection: t0.cat, t1.ax
                           Outer Scan: t0  (cost=0.00..1083384.84 rows=24999184 width=8)
                           Depth 1: GpuHashJoin  (nrows 24999184...99996736)
                                    HashKeys: t0.aid
                                    JoinQuals: (t0.aid = t1.aid)
                                    KDS-Hash (size: 10.39MB)
                           -&gt;  Seq Scan on t1  (cost=0.00..1972.85 rows=103785 width=12)
(18 rows)
</code></pre>

<h1 id="pullup-underlying-plans">Pullup underlying plans</h1>
<p>PG-Strom can run SCAN, JOIN and GROUP BY workloads on GPU, however, it does not work with best performance if these custom execution plan simply replace the standard operations at PostgreSQL.
An example of problematic scenario is that SCAN once writes back its result data set to the host buffer then send the same data into GPU again to execute JOIN. Once again, JOIN results are written back and send to GPU to execute GROUP BY. It causes data ping-pong between CPU and GPU.</p>
<p>To avoid such inefficient jobs, PG-Strom has a special mode which pulls up its sub-plan to execute a bunch of jobs in a single GPU kernel invocation. Combination of the operations blow can cause pull-up of sub-plans.</p>
<ul>
<li>SCAN + JOIN</li>
<li>SCAN + GROUP BY</li>
<li>SCAN + JOIN + GROUP BY</li>
</ul>
<p><img alt="combined gpu kernel" src="../img/combined-kernel-overview.png" /></p>
<p>The execution plan example below never pulls up the sub-plans.</p>
<p>GpuJoin receives the result of GpuScan, then its results are passed to GpuPreAgg to generate the final results.</p>
<pre><code># EXPLAIN SELECT cat,count(*),avg(ax)
            FROM t0 NATURAL JOIN t1
           WHERE aid &lt; bid
           GROUP BY cat;
                              QUERY PLAN

--------------------------------------------------------------------------------
 GroupAggregate  (cost=1239991.03..1239995.15 rows=27 width=20)
   Group Key: t0.cat
   -&gt;  Sort  (cost=1239991.03..1239991.50 rows=189 width=44)
         Sort Key: t0.cat
         -&gt;  Custom Scan (GpuPreAgg)  (cost=1239980.10..1239983.88 rows=189 width=44)
               Reduction: Local
               GPU Projection: cat, pgstrom.nrows(), pgstrom.nrows((ax IS NOT NULL)), pgstrom.psum(ax)
               -&gt;  Custom Scan (GpuJoin)  (cost=50776.43..1199522.96 rows=33332245 width=12)
                     GPU Projection: t0.cat, t1.ax
                     Depth 1: GpuHashJoin  (nrows 33332245...33332245)
                              HashKeys: t0.aid
                              JoinQuals: (t0.aid = t1.aid)
                              KDS-Hash (size: 10.39MB)
                     -&gt;  Custom Scan (GpuScan) on t0  (cost=12634.49..1187710.85 rows=33332245 width=8)
                           GPU Projection: cat, aid
                           GPU Filter: (aid &lt; bid)
                     -&gt;  Seq Scan on t1  (cost=0.00..1972.85 rows=103785 width=12)
(18 rows)
</code></pre>

<p>This example causes data ping-pong between GPU and host buffers for each execution stage, so not efficient and less performance.</p>
<p>On the other hands, the query execution plan below pulls up sub-plans.</p>
<pre><code># EXPLAIN ANALYZE SELECT cat,count(*),avg(ax)
                    FROM t0 NATURAL JOIN t1
                   WHERE aid &lt; bid
                   GROUP BY cat;
                              QUERY PLAN
--------------------------------------------------------------------------------
 GroupAggregate  (cost=903669.50..903673.62 rows=27 width=20)
                 (actual time=7761.630..7761.644 rows=27 loops=1)
   Group Key: t0.cat
   -&gt;  Sort  (cost=903669.50..903669.97 rows=189 width=44)
             (actual time=7761.621..7761.626 rows=27 loops=1)
         Sort Key: t0.cat
         Sort Method: quicksort  Memory: 28kB
         -&gt;  Custom Scan (GpuPreAgg)  (cost=903658.57..903662.35 rows=189 width=44)
                                      (actual time=7761.531..7761.540 rows=27 loops=1)
               Reduction: Local
               GPU Projection: cat, pgstrom.nrows(), pgstrom.nrows((ax IS NOT NULL)), pgstrom.psum(ax)
               Combined GpuJoin: enabled
               -&gt;  Custom Scan (GpuJoin) on t0  (cost=12483.41..863201.43 rows=33332245 width=12)
                                                (never executed)
                     GPU Projection: t0.cat, t1.ax
                     Outer Scan: t0  (cost=12634.49..1187710.85 rows=33332245 width=8)
                                     (actual time=59.623..5557.052 rows=100000000 loops=1)
                     Outer Scan Filter: (aid &lt; bid)
                     Rows Removed by Outer Scan Filter: 50002874
                     Depth 1: GpuHashJoin  (plan nrows: 33332245...33332245, actual nrows: 49997126...49997126)
                              HashKeys: t0.aid
                              JoinQuals: (t0.aid = t1.aid)
                              KDS-Hash (size plan: 10.39MB, exec: 64.00MB)
                     -&gt;  Seq Scan on t1  (cost=0.00..1972.85 rows=103785 width=12)
                                         (actual time=0.013..15.303 rows=100000 loops=1)
 Planning time: 0.506 ms
 Execution time: 8495.391 ms
(21 rows)
</code></pre>

<p>You may notice that SCAN on the table <code>t0</code> is embedded into GpuJoin, and GpuScan gets vanished.
It means GpuJoin pulls up the underlying GpuScan, then combined GPU kernel function is also responsible for evaluation of the supplied WHERE-clause.</p>
<p>In addition, here is a strange output in <code>EXPLAIN ANALYZE</code> result - it displays <em>(never executed)</em> for GpuJoin.
It means GpuJoin is never executed during the query execution, and it is right. GpuPreAgg pulls up the underlying GpuJoin, then its combined GPU kernel function runs JOIN and GROUP BY.</p>
<p>The <code>pg_strom.pullup_outer_scan</code> parameter controls whether SCAN is pulled up, and the <code>pg_strom.pullup_outer_join</code> parameter also controls whether JOIN is pulled up.
Both parameters are configured to <code>on</code>. Usually, no need to disable them, however, you can use the parameters to identify the problems on system troubles.</p>
              
            </div>
          </div>
          <footer>
  
    <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
      
        <a href="../sys_admin/" class="btn btn-neutral float-right" title="System Administration">Next <span class="icon icon-circle-arrow-right"></span></a>
      
      
        <a href="../install/" class="btn btn-neutral" title="Install"><span class="icon icon-circle-arrow-left"></span> Previous</a>
      
    </div>
  

  <hr/>

  <div role="contentinfo">
    <!-- Copyright etc -->
    
  </div>

  Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
      
        </div>
      </div>

    </section>

  </div>

  <div class="rst-versions" role="note" style="cursor: pointer">
    <span class="rst-current-version" data-toggle="rst-current-version">
      
      
        <span><a href="../install/" style="color: #fcfcfc;">&laquo; Previous</a></span>
      
      
        <span style="margin-left: 15px"><a href="../sys_admin/" style="color: #fcfcfc">Next &raquo;</a></span>
      
    </span>
</div>
    <script>var base_url = '..';</script>
    <script src="../js/theme.js" defer></script>
      <script src="../search/main.js" defer></script>

</body>
</html>
